Goto

Collaborating Authors

 novel reinforcement learning algorithm


Novel Reinforcement Learning Algorithm for Suppressing Synchronization in Closed Loop Deep Brain Stimulators

arXiv.org Artificial Intelligence

Parkinson's disease is marked by altered and increased firing characteristics of pathological oscillations in the brain. In other words, it causes abnormal synchronous oscillations and suppression during neurological processing. In order to examine and regulate the synchronization and pathological oscillations in motor circuits, deep brain stimulators (DBS) are used. Although machine learning methods have been applied for the investigation of suppression, these models require large amounts of training data and computational power, both of which pose challenges to resource-constrained DBS. This research proposes a novel reinforcement learning (RL) framework for suppressing the synchronization in neuronal activity during episodes of neurological disorders with less power consumption. The proposed RL algorithm comprises an ensemble of a temporal representation of stimuli and a twin-delayed deep deterministic (TD3) policy gradient algorithm. We quantify the stability of the proposed framework to noise and reduced synchrony using RL for three pathological signaling regimes: regular, chaotic, and bursting, and further eliminate the undesirable oscillations. Furthermore, metrics such as evaluation rewards, energy supplied to the ensemble, and the mean point of convergence were used and compared to other RL algorithms, specifically the Advantage actor critic (A2C), the Actor critic with Kronecker-featured trust region (ACKTR), and the Proximal policy optimization (PPO).


AlphaZero, a novel Reinforcement Learning Algorithm, deployed in JavaScript

#artificialintelligence

In this blog post, you will learn about and implement AlphaZero, an exciting and novel Reinforcement Learning Algorithm, used to beat world-champions in games like Go and Chess. You will use it to master a pen-and-pencil game (Dots and Boxes) and deploy it into a web app, entirely in JavaScript. AlphaZero's key and most exciting aspect is its ability to gain superhuman behavior in board games without relying on external knowledge. AlphaZero learns to master the game by playing against itself (self-play) and learning from those experiences. We will leverage a "simplified, highly flexible, commented, and easy to understand implementation" Python version of AlphaZero from Surag Nair available in Github. You can go ahead and play the game here. The WebApp and JavaScript implementation are available here. This code was ported from this Python implementation.


Personalized Medical Treatments Using Novel Reinforcement Learning Algorithms

arXiv.org Machine Learning

In both the fields of computer science and medicine there is very strong interest in developing personalized treatment policies for patients who have variable responses to treatments. In particular, I aim to find an optimal personalized treatment policy which is a non-deterministic function of the patient specific covariate data that maximizes the expected survival time or clinical outcome. I developed an algorithmic framework to solve multistage decision problem with a varying number of stages that are subject to censoring in which the "rewards" are expected survival times. In specific, I developed a novel Q-learning algorithm that dynamically adjusts for these parameters. Furthermore, I found finite upper bounds on the generalized error of the treatment paths constructed by this algorithm. I have also shown that when the optimal Q-function is an element of the approximation space, the anticipated survival times for the treatment regime constructed by the algorithm will converge to the optimal treatment path. I demonstrated the performance of the proposed algorithmic framework via simulation studies and through the analysis of chronic depression data and a hypothetical clinical trial. The censored Q-learning algorithm I developed is more effective than the state of the art clinical decision support systems and is able to operate in environments when many covariate parameters may be unobtainable or censored.